Skip to content

Conversation

@AKHIL-149
Copy link
Contributor

Summary

fixes #63314 - pivot_table creating duplicate indices on python 3.14 with numpy 1.26

tracked down the actual bug. wasn't in compress_group_index like i thought - it's numpy's searchsorted that's broken with this version combo.

What was happening

  • unstack uses searchsorted to build the compressor array
  • with py3.14 + numpy 1.26, searchsorted returns duplicate values instead of unique positions
  • this causes multiple different index values to map to the same output row

The fix

fallback to the np.unique approach when on python 3.14 + numpy < 2.0. this is the same method the non-sorted path already uses, so it's tested.

Testing

tested with the reproduction case from the issue (100k rows, 3 metrics). works correctly now.

found the real issue - searchsorted is broken with python 3.14 + numpy 1.26. it's not compress_group_index, it's the compressor calculation in unstack that uses searchsorted.

just fallback to the unique/return_index approach for this combo, same as what the non-sorted path does.

works with 100k rows now.
@AKHIL-149
Copy link
Contributor Author

pre-commit.ci autofix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: large pivot_table has incorrect output with Python 3.14

1 participant